New asymmetric iterative scaling models for the generation of textual word maps
نویسندگان
چکیده
The iterative spring model (Kopcsa and Schiebel, 1998) is a kind of multidimensional scaling algorithm (MDS) based on point mass mechanics, that embeds objects in a two dimensional Euclidean space and allows to visualize object relationships and cluster structure. This technique assumes that the similarity matrix for the data set under consideration is symmetric. However there are many interesting problems where asymmetric proximities arise, like text mining problems. In this work we propose a variety of improvements to this algorithm to deal with asymmetric dissimilarities. Clustering quality and distances preservation of the resulting word maps are evaluated through objective measures. The new asymmetric algorithms outperform both, their symmetric counterpart and other widely used multidimensional scaling methods according to the objective measures computed.
منابع مشابه
یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملA new 2D block ordering system for wavelet-based multi-resolution up-scaling
A complete and accurate analysis of the complex spatial structure of heterogeneous hydrocarbon reservoirs requires detailed geological models, i.e. fine resolution models. Due to the high computational cost of simulating such models, single resolution up-scaling techniques are commonly used to reduce the volume of the simulated models at the expense of losing the precision. Several multi-scale ...
متن کاملScaling and Fractal Concepts in Saturated Hydraulic Conductivity: Comparison of Some Models
Measurement of soil saturated hydraulic conductivity, Ks, is normally affected by flow patterns such as macro pore; however, most current techniques do not differentiate flow types, causing major problems in describing water and chemical flows within the soil matrix. This study compares eight models for scaling Ks and predicted matrix and macro pore Ks, using a database composed of 50 datasets...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کامل